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ABSTRACT ^ 

This paper was presented with other papers in a foruft 
dfealing with statewide testing programs. The primary purpose of the 
paper is to Address practical considerations and methods of * 
resolution for large districts or states who are planning on' 
conducting large scale testing or assessment programs with criterion 
or performance referenced measures. The first section lists the 
parameters and limits within which these programs generally operate. 
These limits are translated into practical problems and decision 
points. Methods of resolving the problems are then addressed .with " 
emphases being given to prof^issional and community involvement. The 
paper closes with^ comment^ on test validity and how it is affected by 
these problems and concerns. {Author) 
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LARGE-SCALE OBJECTIVE REFERENCED TESTING: 
SOME PRACTICAL PROBLEMS AND CONCERNS - 

I. Introduction 

An Increasing number of states and large districts 
are moving toward objective referenced testing. Mary Hall of 
the Oregon Department "of Education (1975) noted that^ twenty- 
eight percent of the states now use objective referenced 
tests, twenty percent implement ecleqtic approaches, and 
fifty-two perc^ent use norm referenced materials. With the 
implementation of these types of examinations, a new set of 
problems are faced and new sets of pr{Jfc'edures are required. 

In understanding these problems, we must first 

examine some of the parameters .v±thin v/hich we operate, 

t- 

, , as the problems are closely akin to these limits-. After 

noting these parameters and problems, this paper will be 
concerned v;ith process requirements for solving the problems 
and close with some thoughts on validity of objective 
referenced testing for large populations. 

II. Parameters of Operation 
For states or large districts, limits are placed 
on the development of tests. Coupled with these limits are 
the requirements of objective referenced testing. 

The first of these is concerned with program 
diagnostics. Any objective referenced test will have to 
provide results in a format which can readily be used to 
aGseso procram .^trenfrtho and v/eaknesses. This call:' for 
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careful repor^t design and more extensive and detailed 
Information. I must whole-heartedly endorse Lorrle Sheppard's 
comments on designing the report as a first step (1975). 

The second requirement Is the provision of "semi- 
diagnostic" individual student profiles. The word "semi- 
diagnostic" is used because a totc?lly diagnostic picture is 
not practical with a large scale program. With the need 
for scoring thousands of answer sheets, the ^form is virtually 
forced into multiple-choice, close-ended types of questions. 
The length of testing time, usually less than three or four 
hours, precludes the\ in-depth diagnostic picture which would 
be desired. 

A thi^rd need is for fast turn-around. A test 
which is to be of any value must have results returned 
in a very short period of time. Results which are three or 
four months stale prevents the program components from being 
reactive to identified needs and prevents the proper use of 
individual diagnostics. 

Ill . Practical Problems 
The above noted limits pose problems within 
themselves in that they define a rather tight space of opera- 
tion. These limits are compounded with other test construction 
problemG. 

Probably the first problem of test development 

will be the breadth of the subject matter to be tested. Regard 

lens of the area , t:ho v'-:ry breadth of the field Vlll 

preclude the u.-.e of bc-havloral objeotives with three itemc per 
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objective. For exampjle^ the simple "specif Icatlon of adding 
two, two-digit numbers* could be expanded to five or more 
behavioral objectives when considering addition with zero, 
addition with carrying, without carrying, vertical^ formats. 



classification systems for arranging the subject matter are. 
always open to question. For mathematics, the decisions are 
relatively slruple. For reading, however, organization Is more 
difficult because there are several differing theoretical 
views on reading whlqia-have various Implications for test 
content as well as test organization. As areas such as 
history or civics are more complicated, the degree of basic 
agreement on content Is not found as In the basic skills.,. The 
affective aneas show even less ceritral agreement, with a great 
variation In the attributes to be measured and the dimensions 
along which they should be assessed. 



breadth versus depth decisions. Either a particular area 
of the discipline may be tested In a thorough fashion or a 
broad brush must be applied with the loss of discrete 
information In particular sub-areas. Trade-offs are required 
which will. In turn, require close consultation with the users. 



of test balance. Decisions must be made on whether encoding 
is more important tnan comprehension. Should study skills 



and horizontal formats. 



In addition to this concern, taxonomies or 



These concerns will rapidly be translated into 



Closely tied to the breadth/depth problem is that 




All of these practical problems must be addressed 
in <^terms of the potential test user,. If the progr,am is very 
broad based, it is likely that little agreement will be''found 
on these test building concerns. However, the tests-\will be' 
val^d only to the. degree that there is agreement with the 
user on content. Consequently, mechanisms must be found to 
aid the process of consensus in addressing these problems. 

III. Resolving Problems 

In many respects ,..--the process of resolving the ^ 
problems is more important than the product itself. If the 
users are not directly involved, it is doubtful that the 
tests will enjoy the acceptance or that the test 
makers might wish. Consequently, the establishment of good, 
working interaction and feedback channels is absolutely necessary 
if the program is going to be effective. 

In addressing the problems of breadth versus 
depth and taxonomies, it is important to incorporate- sub j ect 
matter specialists in the subject matter areas to assist in 
determining operational definitions. Even more important 
is the use of teachers. All too often, central staff and 
test developers have rosy-eyed views of the reality of what 
goe.s on in the classroom. Certainly, we must concern ourselves 
with should questions but v;e must not allow confusion v;ith 
what l3_ reality in the classrooms. 

Our concerns w.lth test balance, breadth, and 
consensus can be lar?;;ely resolved by close working relations 
with the ':ho oCcr^K:. 
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IV. Comments on Validity 
,Wlth any broad-based testing program, validity 
problems' wia.1 Increase -as the number of schools and districts 
being tested Increases. Likewise, validity concerns will oe 
a direct function of the degree of district and school 
latitude In establishing their Instrucl^lonal objectives 
and materials. With the strong tradition of local control 
of education In the United States, state assessment programs 
will always face validity questions, t- It Is consequently 
necessary that continuous re-validatlon of testing objectives 
be undertaken. This constant re-valldatlon necessitates a 
further swap-off. In that a common core must be retained If we 
are to have a v/orthwhlle longitudinal data base. 

As noted above, the further away from areas of 
high agreement, the more general the test content becomes 
and the potentially less valid the results for local use. 
The lack\of composite scores In a norm-referenced fashion 
provides distinct advantages In handling this validity concern 
The districts or schools which do not find the particular 
objectives relevant must be given the option of declaring 
these objectives and test scores as non-relevant for them. 

In addressing all of the practical problems and 
procedures, it cannot be, over-stressed that the key^ to 
resolution is in the quality of the interactions with the 
user. Any beginning assessment program^ will have practical 
problems, and v;ill require revisions. If the interactive 
rolat ionshipc are p<^:"itive, clear ind precise Information 
on where program Improvements are needed will be received 
as .well as the time and latitude to make these improvements. 



